Previous Book Contents Book Index Next

Inside Macintosh: Programming With the Text Encoding Conversion Manager /
Chapter 3 - Text Encoding Converter Reference / Text Encoding Converter Functions
Investigating Encodings /


TECSniffTextEncoding

Sniffs a text stream of unknown encoding, based on an array of possible encodings, and returns the probable encodings in a ranked list.

pascal OSStatus TECSniffTextEncoding (
                     TECSnifferObjectRef encodingSniffer, 
                     TextPtr inputBuffer, 
                     ByteCount inputBufferLength, 
                     TextEncoding testEncodings[], 
                     ItemCount numTextEncodings, 
                     ItemCount numErrsArray[], 
                     ItemCount maxErrs, 
                     ItemCount numFeaturesArray[], 
                     ItemCount maxFeatures);
encodingSniffer
A pointer to a sniffer object.
inputBuffer
The text to be sniffed.
inputBufferLength
The length of the input buffer.
testEncodings[]
An array of text encoding specifications. On input, you must specify which text encodings you want to sniff for. On output, this array contains the input array rearranged in the order of most likely to least likely text encodings.
numTextEncodings
A value of type ItemCount. This value refers to the number of entries in the testEncodings[] parameter.
numErrsArray[]
An array of type ItemCount. This array must contain at least numTextEncodings elements. On return, numErrsArray holds the number of errors found for each possible text encoding. The entries are in the same order as the entries in the testEncodings[] parameter at output.
maxErrs
The maximum number of errors allowed for a sniffer. The sniffer stops sniffing an encoding after this number is reached when creating the numErrsArray list.
numFeaturesArray[]
An array of type ItemCount. This array must contain at least numTextEncodings elements. On return, the numFeaturesArray[] parameter holds the number of features found for each possible text encoding. The entries are in the same order as the entries in the testEncodings[] parameter at output.
maxFeatures
The maximum number of features allowed for a sniffer. The sniffer stops sniffing an encoding after this number is reached when creating the numFeaturesArray list.
function result
A result code. See "Text Encoding Conversion Manager Result Codes" (page 42) for a list of possible values. If this function returns a result code other than noErr, then one of the conversion plug-ins accessed by the converter encountered an error condition while accessing a sniffer function.
DISCUSSION
For a specified stream of bytes in an unknown encoding and an array of possible encodings, TECSniffTextEncoding returns counts of "errors" and "features" for each of the encodings. Each error indicates a code point or sequence that is illegal in the specified encoding, and a feature indicates the presence of a sequence that is characteristic of that encoding. Table 3-1 shows sample output from a sniffer run.

Sample Sniffer Output
EncodingErrorsFeatures
EUC08
JIS00
Mac OS Japanese2020

For example, the byte sequence which is interpreted in Mac OS Roman as "äøéö" could legally be interpreted either as Mac OS Roman text or as Mac OS Japanese text. Both sniffers would return zero errors, but the Mac OS Japanese sniffer would also return two features of Mac OS Japanese (representing two legal 2-byte characters.)

The arrays are returned in a ranked list with the most likely text encodings first. The results are sorted first by number of errors (fewest to most), then by number of features (most to fewest), and then by the original order in the list. Upon return from the function, you can assume the correct encoding is in testEncodings[0], or possibly testEncodings[1].

If any of the available encodings are not examined, their number of errors and number of features are set to 0xFFFFFFFF, and they sort to the end of the list.

SEE ALSO
The function TECCountAvailableSniffers (page 83)

The function TECGetAvailableSniffers (page 84)

The function TECCreateSniffer (page 85)


Previous Book Contents Book Index Next

© Apple Computer, Inc.
13 NOV 1997